Evaluating a proposed intersection upgrade locations in The City of CHarlotte, North Carolina
This project is part of a GIS course assessment challenge using Rstudio, focused on helping the City of Charlotte decide where to upgrade intersections after receiving $4.47 million to reduce pedestrian injuries and deaths.
Project Summary
Pedestrians are considered as a vulnerable road user group
In The City of Charlotte, U.S. Department of Transportation federal grants program fund initiatives to prevent roadway deaths and serious injuries, by upgrading and installing equipment to reduce or eliminate pedestrian-involved crashes
Analysis is needed to evaluate 22 intersections around The City of Charlotte for upgrades as the project scheduled to receive additional funds in October. Thus, project reviews will be continued through the rest of the year
This project will only focus on the areas where the high pedestrian-related crash exists
Choosing point pattern analysis and spatial autocorrelation to summarise pattern in the crash locations data
Using DBSCAN in point pattern analysis will help to create cluster of points
Using spatial-autocorrelation (Global Moran & Local Moran’s I) will help to measure how much the presence of one feature is related to the presence of similar features in nearby areas.
Overlaying the Moran’s Result with proposed intersection upgrade locations
Dataset
Main Dataset : Fatal_or_Serious_Injury_Crashes (shp points) From 2019-2023 I use the pedestrian-related crashes from 2019-2023, with key columns such as: year, month, day, crash type Despite the pandemic occurence in 2020-2022 this analysis includes those data because they still could give values of the occurences and pattern.
Main Dataset : Safe_Streets_For_All_Grant_Projects (shp points) Dataset of intersection improvement locations Project that contains gisid, location, Description of upgrade projects
Census_Commuting_Block_Groups (shp polygon) Contain geo-name with total 555 block groups, this data is chosen because finer level of detailand are specifically used to analyze commuting patterns
All Datasets projected on NAD83 / North Carolina (ftUS) in US Feet
Question to Address
Do Pedestrian-related crashes exhibit a specific pattern across The City of Charlotte between 2019-2023?
Does Pedestrian-related crashes vary across The City of Charlotte between 2019-2023?
Are there any area that has not been covered by the intersection improvement upgrade plans?
Step 1: Pre-Processing the data
#Load librarieslibrary(here)
here() starts at E:/portfolio/nooriza.github.io
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library(here)library(sf)
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(sp)library(tmap)
Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
remotes::install_github('r-tmap/tmap')
library(spatstat)
Warning: package 'spatstat' was built under R version 4.4.2
Loading required package: spatstat.data
Warning: package 'spatstat.data' was built under R version 4.4.2
Loading required package: spatstat.univar
Warning: package 'spatstat.univar' was built under R version 4.4.2
Warning: package 'spatstat.linnet' was built under R version 4.4.2
spatstat.linnet 3.2-2
spatstat 3.2-1
For an introduction to spatstat, type 'beginner'
library(fpc)
Warning: package 'fpc' was built under R version 4.4.2
library(stringr)library(dbscan)
Warning: package 'dbscan' was built under R version 4.4.2
Attaching package: 'dbscan'
The following object is masked from 'package:fpc':
dbscan
The following object is masked from 'package:stats':
as.dendrogram
library(ggplot2)library(spdep)
Warning: package 'spdep' was built under R version 4.4.2
Loading required package: spData
Warning: package 'spData' was built under R version 4.4.2
Reading layer `Safe_Streets_For_All_Grant_Projects' from data source
`E:\portfolio\nooriza.github.io\Data\Safe_Streets_For_All_Grant_Projects.shp'
using driver `ESRI Shapefile'
Simple feature collection with 22 features and 6 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 1427738 ymin: 483074.8 xmax: 1485674 ymax: 584838.9
Projected CRS: NAD83 / North Carolina (ftUS)
#The Commuting Block Group BoundaryAdmin <-st_read(here::here('Data','Census_Commuting_Block_Groups.shp' )) %>%clean_names()
Reading layer `Census_Commuting_Block_Groups' from data source
`E:\portfolio\nooriza.github.io\Data\Census_Commuting_Block_Groups.shp'
using driver `ESRI Shapefile'
Simple feature collection with 555 features and 20 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 1384287 ymin: 460659.4 xmax: 1536942 ymax: 647866.3
Projected CRS: NAD83 / North Carolina (ftUS)
Filtering and Transforming Data
What I find through the data & will execute are:
1. There are no NA values accross rows in the key columns of the main dataset
2. Filtering to pedestrian related crashes only
3. Make sure that coordinates is appropiate
#Select keycolumns in crash dataselected_crash <- crash[, c("objectid", "crsh_id","date_val_y","date_val_1","day_of_w_1","case_num", "crsh_type","crash_type","crsh_levl", "primary_1","latitude","longitude","geometry")]#Select Pedestrian and make sure points located correctly in North Carolinaselected_crash1 <- selected_crash %>%dplyr::filter(crash_type =="Pedestrian") %>%dplyr::filter(longitude <0& latitude >0) %>%#dplyr::distinct(crsh_id, .keep_all =TRUE) #no duplicate of crash recorded #Reproject All SHPs into projected CRS in metre using EPSG 32119selected_crash1 <-st_transform(selected_crash1, 32119)st_crs(selected_crash1)
#Spatial join between the census commuting block group boundary & crash points plus calculation of crash densitycrash_sj <- selected_Admin %>%mutate(n =lengths(st_intersects(., selected_crash1))) %>%# Count intersections of pedestrian onlymutate(area =st_area(.)) %>%mutate(density = n / area)#Check to see how it looks liketmap_mode("plot")
tmap mode set to plotting
tm_shape(crash_sj) +tm_polygons("density",style ="jenks",palette ="Reds",title ="Crash Density 2019-2023")
Overall, the density map of pedestrian-related crash occurrences shows that the high occurrences of crashes, whether injury or fatal, are located in the center of the cities but not yield compact clustering rather it scattered outwards into windrose. Hence the map shows a unique pattern, as the ‘dark red (high density)’ follows a slight linear pattern. I suspect this pattern is related to spesific type of road networks, so I will overlay the data using OSM Basemap.
#Overlaying with OSMtmap_mode("view")
tmap mode set to interactive viewing
tm_shape(crash_sj) +tm_polygons("density",style ="jenks",palette ="Reds",title ="Eviction Density 2024",alpha =0.5) +#Making the layer transparent is a bad move in map visualization but I will do it here as I want to see the basemaptm_shape(selected_crash1) +tm_dots(col ='red') +tm_basemap("OpenStreetMap")
By overlaying the pedestrian crash, density and basemap the high occurences often found on a Commuting Block near main road such as Central Avenue, North Sharon Amity Road, South Cedar Street etc. From the map can be seen that the Uptown or (city centre) is also the high occurences. The uptown is well known as a most urbanised neighbourhood and well known as a Central Bussiness District [1], so there would be many important services in here that attracts mobility.
Actually the analysis will be deeper if using the network dataset as it helps to identify what kind of road hierarchies the high density will occurs. Besides, adding land use data also will yield a richer analysis as it helps to identify dominant land use this density are mainly located. However it is outside the scope of this short project.
Step 3 : Analysis
A. Point Pattern Analysis with DBSCAN
The purpose is to discover clusters in space. Performing DBSCAN requires:
eps
Using Ripley’s K to get an initial value for further choosing the good epsilon & validating the value using KNN Distribution plot to find a suitable eps value based on the ‘knee’
b. min pts
I would choose the median number occurences (n) that spans from 0 to 5 in the commuting block; so the min pts is 3.
#set a window as the borough boundarywindow <-as.owin(selected_Admin)plot(window)
#create a sp objectselected_crash1 <- selected_crash1 %>%#for the points dataas(., 'Spatial')#create a ppp objectselected_crash1.ppp <-ppp(x=selected_crash1 @coords[,1],y=selected_crash1 @coords[,2],window=window)
Intuition to find Epsilon with Ripley’s K
#Plotting the RIpley's K graphK <- selected_crash1.ppp %>%Kest(., correction="border")%>%plot()
From the Ripley’s K plot, we can see that the pedestrian-related crash data is above the theoritical values of poisson distribution(the red line). The plot means that up to a distance of 2500 meters, pedestrian crash cases are very close to each other, showing clustering. Meanwhile, the knee is identified with the largest bulge in the graph starting at around 2500 m, so we use this as the epsilon.
#first extract the points from the spatial points data frameselected_crash1Points <- selected_crash1 %>%coordinates(.)%>%as.data.frame()#now run the dbscan analysisdb <- selected_crash1Points %>% fpc::dbscan(.,eps =2500, MinPts =3)#More robust to find eps value based on the ‘knee’ in the KNN plotselected_crash1Points%>% dbscan::kNNdistplot(.,k=3)
#now plot the resultsplot(db, selected_crash1Points, main ="DBSCAN Output", frame = F)plot(selected_Admin$geometry, add=T)
The KNN Distribution graph above validates that after calculcating average distance from every 3 points( K neighbours) in a distance around 2500 the largest spike of points sorted by distance happened, this is validating the previous knee value from Ripley’s K: where the value (of distance to neighbours) increases.
Interpretations of point pattern analysis using DBSCAN: The DBSCAN analysis identifies 5 distinct clusters, with a notable cluster covering the centre (red), northwest (blue), southeast (green), north (purple), northeast(blue). *I hope the colour does not differ if run in another PC, but the direction will mitigate this if that happened.
B. Spatial Autocorrelation : Moran’s I
a. Performing Global Moran’s I The purpose is to measure similarity between nearby crashes points and provide a value to give a general sense of overall spatial autocorrelation
b. Performing Local Moran’s I Meanwhile, Local’s Moran I will identify specific locations with similar or dissimilar density values.
#First calculate the centroids of all boroughscentroid_block <- crash_sj%>%#Use the result of spatial joined data with density (polygon)st_centroid()%>%st_geometry()
Warning: st_centroid assumes attributes are constant over geometries
#plot(centroid_block,axes=TRUE)
I create spatial weight matrix using the queen move in this analysis. Queen move suggests that each commuting block polygon is considered a neighbor if it shares either an edge or a corner with another commuting block polygons.
#Make neighbourscommunity_nb <- crash_sj %>%#Use the result of spatial joined data with density (polygon)poly2nb(., queen=T)#Let's see the summarysummary(community_nb )
Neighbour list object:
Number of regions: 555
Number of nonzero links: 3510
Percentage nonzero weights: 1.139518
Average number of links: 6.324324
Link number distribution:
2 3 4 5 6 7 8 9 10 11 12 13
4 16 59 106 132 111 64 40 12 7 2 2
4 least connected regions:
93 170 192 264 with 2 links
2 most connected regions:
88 193 with 13 links
The summary tells us that the average number of neighbours is 6.324 which mean that each polygon in the crash_sj (crash occurences point that joined spatially with commuting block) dataset has about 6 neighboring polygons based on the queen contiguity method. Let’s see how the matrix looked across commuting Block in The City of Charlote.
#plot themplot(community_nb , st_geometry(centroid_block), col="red")#add a map underneathplot(crash_sj$geometry, add=T)
#Make the weight matrixcommunity_nb.lw <- community_nb %>%nb2mat(., style="W") # row standardised : ensuring that each neighborhood's influence is equally distributed among its neighborssum(community_nb.lw)
[1] 555
This operation returns 555, which mean that each Commuting Block is linked to other Commuting Blocks and the strength of these connections is evenly distributed across the neighborhoods
#Make weight list for Global Moran's Icommunity_nb.lw <- community_nb %>%nb2listw(., style ="W")
Global Moran’s I
In Global Moran’s I the value 1 = clustered, 0 = no pattern, -1= disperse
Moran I test under randomisation
data: .
weights: community_nb.lw
Moran I statistic standard deviate = 5.6759, p-value = 6.9e-09
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.1332969630 -0.0018050542 0.0005665789
The interpretation of Global Moran’s I result is: The value 0.133 suggests a weak but positive spatial autocorrelation. This means that there is a slight tendency for similar values to be located near each other, but the clustering is not very strong. To further identify specific areas with slight tendency of sharing similar values I will use Local Moran’s I.
Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.
#From the result we could join the result with the the initial data to make further visualizationcrash_sj_moran <- crash_sj %>%mutate(densityI =as.numeric(I_localdensity$Ii)) %>%mutate(densityIz =as.numeric(I_localdensity$Z.Ii))
Step 4 : Visualization
#set the breaks based on upper bound to ensure that all possible z-scores are included and standard deviation rulebreaks1<-c(-1000,-2.58,-1.96,-1.65,1.65,1.96,2.58,1000)#set the colourlibrary(RColorBrewer)MoranColours <-brewer.pal(8, "YlOrRd")
Mapping Interactive visualization
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(crash_sj_moran)+tm_polygons("densityIz", style ="fixed", breaks = breaks1, palette = MoranColours, midpoint =NA, alpha =0.5, #again I choose transparency to see the basemap to identify the land usetitle ="Local Moran's I, 2019-2023 Pedestrian-related crash" ) +tm_basemap("OpenStreetMap")
To add more details name, add this to above script
+tm_text( “geoname”, # Replace with the actual column name for block names size = 0.7, # Adjust text size as needed col = “black”, # Color of the text just = “center”, # Adjust text alignment shadow = TRUE # Add shadow for better visibility, optional )
Interpretation of Local Moran’s I result :
The distinctive areas that shares similar high values located in the Uptown Areas [Object Id 4003 and 4397], in North East areas around Commuting Block 5 in Census Track 55.24, Mecklenburg County [Object id 3928,4215]. Honestly, this results are hard to interpret.
In terms of finer scale, Census Commuting Block Group does helps to identify a detailed areas, but pinpointing the name of the area is much easier when using the Census tract. However in this analysis, I will interpret the Local Moran’s Result with the help of OSM Basemap. So the high density that shares similar values are:
The uptown areas/city centre
South East direction from city centre in Evergreen Nature Preserve, which potentially a lot of pedestrian to visit the Park
North East direction from city centre in James Martin Middle School, Governor’s Village Academy, and Julis Chambers High School, which potentially a lot of students commuting from and go to school
Overlaying with Upgrade Points From the Government
tm_shape(crash_sj_moran)+tm_polygons("densityIz", style ="fixed", breaks = breaks1, palette = MoranColours, midpoint =NA, alpha =0.5, #again to see the basemaptitle ="Local Moran's I, 2019-2023 Pedestrian-related crash" ) +tm_shape(upgrade_points) +tm_dots(col ="blue", size =0.1)
From the map we can see If the Government want to evaluate the intersection upgrade locations project, then upgrade should includes the 3 areas from the Local’s Moran I which represents the CBD, Parks, and Schools, as these 3 locations have not been included in the upgrade plan. However this result is more appropiate if the result is also verified using road sign distribution data (including sign data such as Accessible Pedestrian Signal (APS), Leading Pedestrian Interval (LPI), Rectangular Rapid Flashing Beacon (RRFB) etc)
Results
The DBSCAN analysis identifies 5 distinct clusters between 2019-2023, with a notable cluster groups are in the centre, northwest, southeast, north, northeast from the city centre position.
Pedestrian-related crashes show some variation across the City of Charlotte between 2019-2023. The Moran’s I value of 0.133 indicates weak positive spatial autocorrelation, meaning there is a slight tendency for crashes to cluster in certain areas rather than being randomly distributed. While this suggests some localized patterns, the low magnitude of the value implies that the clustering is not very strong.
The Local Moran I’s result in the northeast and southeast part of The City of Charlotte aligned with the notion that the further someone gets away from the uptown area, the bigger the dangers for pedestrians [2]. Both of them located far away from the city centre.
There are still areas with the highest density of pedestrian-related crashes that have not been covered by the intersection improvement upgrade plans. All these locations are the results identified by Local Moran’s I.
The strength of this analysis lies in the use of spatial autocorrelation, which identifies specific areas requiring attention. Additionally, the use of an OSM basemap provides contextualization, offering insights into the actual activities occurring in areas with high-neighboring high crash density. I find that the appropriate layer for analyzing pedestrian-related crashes is network data [road polygons], which can calculate crash density based on road segments. While analysis using the total area of the commuting block group is helpful, it tends to generalize the entire block group.
The analysis could be more thorough with a dataset on current road sign availability, allowing for more specific recommendations on the types of intersection improvements needed, as well as a land use dataset to better explain potential driving factors based on land use types. Land Use identied in this analysis is based on the Open Street Map which year is not known.